Advanced Statistical Inference II

Chelsea Parlett-Pelleriti

Predictive Checks

Predictive Distributions

🤔 How do we use distributions of parameters (prior or posterior) to make predictions?

Predictive Distributions

Two sources of variability:

  1. Distributional (Prior or Posterior) Variability

  2. Sampling (Data) Variability

Predictive Distributions

Distributional Variability

How certain are we about the value of the parameters?

Predictive Distributions

Sampling Variability

How much will observed data typically vary from its expected value? a.k.a. inherent variability in the data around the true value.

Predictive Distributions

\[ p(\tilde{x} \mid x) \]

Probability of new sample \(\tilde{x}\) given the current data \(x\).

  1. draw \(\theta_i \sim \pi(\theta)\) (one sample of all the model parameters)

  2. draw \(y_i \sim p(y_i \mid \theta_i)\) (sample of data from model implied by \(\theta_i\))

1 accounts for uncertainty about the parameters \(\theta\), 2 accounts for uncertainty in the sample

Predictive Distributions

Two sources of variability:

  1. Distributional (Prior or Posterior) Variability

  2. Sampling (Data) Variability

Prior Predictive Check

Prior predictive checks allow us to simulate samples from our prior and make sure it lines up with our expectations about reality.

# bayes
priors <- c(
  prior("normal(0,5)", class = "b"),
  prior("normal(0,4)", class = "Intercept"),
  prior("gamma(0.1, 10)", class = "sigma")
)
mod_0 <- brm(y ~ x, data = df, prior = priors, sample_prior = "only") 
mod_0 |> pp_check()

Prior Predictive Check

Let’s adjust our priors to be more realistic.

# bayes
priors <- c(
  prior("normal(0,1)", class = "b"),
  prior("normal(0,1)", class = "Intercept"),
  prior("gamma(1, 1)", class = "sigma")
)
mod_1 <- brm(y ~ x, data = df, prior = priors, sample_prior = "only")
mod_1 |> pp_check()

Posterior Predictive Check

Posterior predictive checks do something similar, but simulates new data based on the posterior. This allows us to “look for systematic discrepancies between real and simulated data” (Gelman et al. 2004) which tells us a little about model fit.

# bayes
priors <- c(
  prior("normal(0,1)", class = "b"),
  prior("normal(0,1)", class = "Intercept"),
  prior("gamma(1, 1)", class = "sigma")
)
m1 <- brm(y ~ x, data = df, prior = priors) 
m1 |> pp_check()

Posterior P-Values

Remember, p-values tell you:

given a distribution \(\pi(\theta)\), what is the probability of observing a \(\theta\) as or more extreme as the one we did? In other words, how consistent is \(\theta\) with \(\pi(\theta)\)?

# generate samples from posterior
yrep <- posterior_predict(m1, draws = 500)

# overall mean function
overall_mu <- function(x) mean(x)

# calc for actual data
overall_mu(df$y) 
[1] 0.236782
# calc for posterior samples
ppc_stat(df$y, yrep, stat = "overall_mu", binwidth = 0.005)

Bayesian Regression in R

mtcars Data

library(tidyverse)
data(mtcars)
glimpse(mtcars)
Rows: 32
Columns: 11
$ mpg  <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8,…
$ cyl  <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8,…
$ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 16…
$ hp   <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180…
$ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92,…
$ wt   <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.…
$ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18…
$ vs   <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0,…
$ am   <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0,…
$ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3,…
$ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2,…

General Model Setup

mod1 <- brm(mpg ~ hp + gear, data = mtcars) 
get_prior(mod1)
                   prior     class coef group resp dpar nlpar lb ub
                  (flat)         b                                 
                  (flat)         b gear                            
                  (flat)         b   hp                            
 student_t(3, 19.2, 5.4) Intercept                                 
    student_t(3, 0, 5.4)     sigma                             0   
       source
      default
 (vectorized)
 (vectorized)
      default
      default

General Model Setup

priors <- c(
  prior("normal(0,5)", class = "b")
)

mod2 <- brm(mpg ~ hp + gear, data = mtcars,
            prior = priors) 
get_prior(mod2)
                   prior     class coef group resp dpar nlpar lb ub
             normal(0,5)         b                                 
             normal(0,5)         b gear                            
             normal(0,5)         b   hp                            
 student_t(3, 19.2, 5.4) Intercept                                 
    student_t(3, 0, 5.4)     sigma                             0   
       source
         user
 (vectorized)
 (vectorized)
      default
      default

Prior Predictive Check

mod2_pp <- brm(mpg ~ hp + gear, data = mtcars,
            prior = priors, sample_prior = "only") 
mod2_pp |> pp_check()

Summary

summary(mod2)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: mpg ~ hp + gear 
   Data: mtcars (Number of observations: 32) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    18.08      3.32    11.34    24.61 1.00     4653     2268
hp           -0.06      0.01    -0.08    -0.05 1.00     4534     2744
gear          3.09      0.78     1.63     4.66 1.00     4503     2274

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     3.21      0.43     2.50     4.19 1.00     3367     2587

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Convergence

library(bayesplot)
summary(mod2)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: mpg ~ hp + gear 
   Data: mtcars (Number of observations: 32) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    18.08      3.32    11.34    24.61 1.00     4653     2268
hp           -0.06      0.01    -0.08    -0.05 1.00     4534     2744
gear          3.09      0.78     1.63     4.66 1.00     4503     2274

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     3.21      0.43     2.50     4.19 1.00     3367     2587

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
mcmc_trace(mod2)

Posterior Predictive Check

mod2 |> pp_check()

Posterior Examination

mcmc_areas(mod2,
           area_method = "scaled height",
           pars = c("b_hp",
                    "b_gear")) + 
  geom_vline(xintercept = c(-0.1,0.1),
             color = "red",
             linetype = "dashed") # ROPE

(more on ROPE with a brms model here)

Bayesian Logistic Regression in R

# s/o to Chat GPT for helping simulat the data
# libs
library(brms)
library(bayesplot)
library(tidybayes)

# sim data
set.seed(540)
n <- 200
parental_income <- rnorm(n, mean = 50, sd = 10) # income_in_k
z <- 1 + 0.05 * parental_income + rnorm(n, 0, 1)
p <- 1 / (1 + exp(-z))
passed_exam <- rbinom(n, 1, p) 
df <- data.frame(parental_income, passed_exam)

# priors
priors <- c(
  prior(normal(0, 5), class = "b"),
  prior(normal(0, 5), class = "Intercept")
)

# fit
fit <- brm(passed_exam ~ parental_income, data = df,
           family = bernoulli(), prior = priors,
           seed = 123, chains = 4, cores = 4, iter = 2000)
Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
using C compiler: ‘Apple clang version 15.0.0 (clang-1500.3.9.4)’
using SDK: ‘MacOSX14.4.sdk’
clang -arch arm64 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppParallel/include/"  -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DUSE_STANC3 -DSTRICT_R_HEADERS  -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -D_HAS_AUTO_PTR_ETC=0  -include '/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1   -I/opt/R/arm64/include    -fPIC  -falign-functions=64 -Wall -g -O2  -c foo.c -o foo.o
In file included from <built-in>:1:
In file included from /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
In file included from /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppEigen/include/Eigen/Dense:1:
In file included from /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppEigen/include/Eigen/Core:19:
/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: 'cmath' file not found
#include <cmath>
         ^~~~~~~
1 error generated.
make: *** [foo.o] Error 1
# summary
summary(fit)
 Family: bernoulli 
  Links: mu = logit 
Formula: passed_exam ~ parental_income 
   Data: df (Number of observations: 200) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept           1.30      1.69    -2.08     4.61 1.00     2293     2106
parental_income     0.04      0.03    -0.03     0.11 1.00     2132     1965

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
# convergence
mcmc_trace(fit) 

# pp_check
pp_check(fit)

# plot params
mcmc_areas(fit,
           pars = c("b_parental_income"))

Bayesian Mixed Effect Models

# s/o to Chat GPT for helping simulate the data
# Load required libraries
library(brms)
library(bayesplot)
library(tidybayes)

# sim
set.seed(123)
n_schools <- 10
n_students <- 20
total_n <- n_schools * n_students

school <- factor(rep(1:n_schools, each = n_students))
parental_income <- rnorm(total_n, mean = 50, sd = 10) # Parental income in thousands of dollars
school_effect <- rnorm(n_schools, 0, 5)[school]
individual_error <- rnorm(total_n, 0, 5)
exam_score <- 50 + 0.5 * parental_income + school_effect + individual_error

df <- data.frame(school, parental_income, exam_score)

# priors
priors <- c(
  prior(normal(0, 10), class = "b"),
  prior(normal(50, 10), class = "Intercept"),
  prior(exponential(1), class = "sd")
)

# fit
fit <- brm(exam_score ~ parental_income + (1 + parental_income | school),
           data = df, family = gaussian(), prior = priors,
           seed = 123, chains = 4, cores = 4, iter = 2000)
Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
using C compiler: ‘Apple clang version 15.0.0 (clang-1500.3.9.4)’
using SDK: ‘MacOSX14.4.sdk’
clang -arch arm64 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppParallel/include/"  -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DUSE_STANC3 -DSTRICT_R_HEADERS  -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -D_HAS_AUTO_PTR_ETC=0  -include '/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1   -I/opt/R/arm64/include    -fPIC  -falign-functions=64 -Wall -g -O2  -c foo.c -o foo.o
In file included from <built-in>:1:
In file included from /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
In file included from /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppEigen/include/Eigen/Dense:1:
In file included from /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppEigen/include/Eigen/Core:19:
/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: 'cmath' file not found
#include <cmath>
         ^~~~~~~
1 error generated.
make: *** [foo.o] Error 1
# summary
summary(fit)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: exam_score ~ parental_income + (1 + parental_income | school) 
   Data: df (Number of observations: 200) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~school (Number of levels: 10) 
                               Estimate Est.Error l-95% CI u-95% CI Rhat
sd(Intercept)                      1.03      0.97     0.03     3.66 1.00
sd(parental_income)                0.11      0.03     0.06     0.19 1.00
cor(Intercept,parental_income)    -0.02      0.55    -0.94     0.93 1.01
                               Bulk_ESS Tail_ESS
sd(Intercept)                      2631     1886
sd(parental_income)                1048     1628
cor(Intercept,parental_income)      348     1082

Regression Coefficients:
                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept          51.74      1.99    47.84    55.71 1.00     5261     2955
parental_income     0.48      0.05     0.37     0.58 1.00     2121     1788

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     4.94      0.26     4.45     5.47 1.00     4369     2284

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
# converge
mcmc_trace(fit) 

# pp_check
pp_check(fit)

# plot
mcmc_areas(fit, pars = c("b_parental_income"))